Regression with Categorical Predictors. Dummy Variables and Interactions

 

> load("Auto.rda")

> attach(Auto)

> country = as.factor(origin)

 

> plot(weight,mpg)

> plot(weight,mpg,col=country)

 

# Country appears to be an important variable that is not numerical.

 

> reg = lm(mpg ~ country)

> summary(reg)

 

Call:

lm(formula = mpg ~ country)

 

Residuals:

    Min      1Q  Median      3Q     Max

-12.451  -5.034  -1.034   3.649  18.966

 

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  20.0335     0.4086  49.025   <2e-16 ***

country2      7.5695     0.8767   8.634   <2e-16 ***

country3     10.4172     0.8276  12.588   <2e-16 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 

Residual standard error: 6.396 on 389 degrees of freedom

Multiple R-squared:  0.3318,    Adjusted R-squared:  0.3284

F-statistic:  96.6 on 2 and 389 DF,  p-value: < 2.2e-16

 

# R created dummy variables country2 and contry3

 

# Including INTERACTIONS

 

> reg = lm(mpg ~ weight*country)

 

# This is a short way to include weight, country, and all interactions

 

> summary(reg)

 

Call:

lm(formula = mpg ~ weight * country)

 

Residuals:

     Min       1Q   Median       3Q      Max

-13.4928  -2.7715  -0.3895   2.2397  15.5163

 

Coefficients:

                  Estimate Std. Error t value Pr(>|t|)   

(Intercept)      4.315e+01  1.186e+00  36.378  < 2e-16 ***

weight          -6.854e-03  3.423e-04 -20.020  < 2e-16 ***

country2         1.125e+00  2.878e+00   0.391  0.69616   

country3         1.111e+01  3.574e+00   3.109  0.00202 **

weight:country2  3.575e-06  1.111e-03   0.003  0.99743   

weight:country3 -3.865e-03  1.541e-03  -2.508  0.01255 * 

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 

 

> reg = lm(mpg ~ weight*country)

> Yhat = fitted.values(reg)                           # Save Y-hat, the miles per gallon predicted by our new model

> points(weight,Yhat,col=country,lwd=3)            

# Adding 3 fitted regression lines to the plot, one for each country! Col = color, lwd = line width